AITopics | yes-no question

Collaborating Authors

yes-no question

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

LLM Robustness Against Misinformation in Biomedical Question Answering

Bondarenko, Alexander, Viehweger, Adrian

arXiv.org Artificial IntelligenceOct-27-2024

The retrieval-augmented generation (RAG) approach is used to reduce the confabulation of large language models (LLMs) for question answering by retrieving and providing additional context coming from external knowledge sources (e.g., by adding the context to the prompt). However, injecting incorrect information can mislead the LLM to generate an incorrect answer. In this paper, we evaluate the effectiveness and robustness of four LLMs against misinformation - Gemma 2, GPT-4o-mini, Llama~3.1, and Mixtral - in answering biomedical questions. We assess the answer accuracy on yes-no and free-form questions in three scenarios: vanilla LLM answers (no context is provided), "perfect" augmented generation (correct context is provided), and prompt-injection attacks (incorrect context is provided). Our results show that Llama 3.1 (70B parameters) achieves the highest accuracy in both vanilla (0.651) and "perfect" RAG (0.802) scenarios. However, the accuracy gap between the models almost disappears with "perfect" RAG, suggesting its potential to mitigate the LLM's size-related effectiveness differences. We further evaluate the ability of the LLMs to generate malicious context on one hand and the LLM's robustness against prompt-injection attacks on the other hand, using metrics such as attack success rate (ASR), accuracy under attack, and accuracy drop. As adversaries, we use the same four LLMs (Gemma 2, GPT-4o-mini, Llama 3.1, and Mixtral) to generate incorrect context that is injected in the target model's prompt. Interestingly, Llama is shown to be the most effective adversary, causing accuracy drops of up to 0.48 for vanilla answers and 0.63 for "perfect" RAG across target models. Our analysis reveals that robustness rankings vary depending on the evaluation measure, highlighting the complexity of assessing LLM resilience to adversarial attacks.

accuracy, large language model, machine learning, (21 more...)

arXiv.org Artificial Intelligence

2410.2133

Country:

Europe > Germany > Saxony > Leipzig (0.05)
Europe > France > Auvergne-Rhône-Alpes > Isère > Grenoble (0.04)

Genre: Research Report > New Finding (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine (0.94)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Interpreting Answers to Yes-No Questions in Dialogues from Multiple Domains

Wang, Zijie, Rashid, Farzana, Blanco, Eduardo

arXiv.org Artificial IntelligenceApr-24-2024

People often answer yes-no questions without explicitly saying yes, no, or similar polar keywords. Figuring out the meaning of indirect answers is challenging, even for large language models. In this paper, we investigate this problem working with dialogues from multiple domains. We present new benchmarks in three diverse domains: movie scripts, tennis interviews, and airline customer service. We present an approach grounded on distant supervision and blended training to quickly adapt to a new dialogue domain. Experimental results show that our approach is never detrimental and yields F1 improvements as high as 11-34%.

computational linguistic, linguistic, yes-no question, (16 more...)

arXiv.org Artificial Intelligence

2404.16262

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Oceania > Australia > Victoria > Melbourne (0.04)
North America > Canada > Ontario > Toronto (0.04)
(17 more...)

Genre: Research Report > New Finding (0.48)

Industry: Leisure & Entertainment > Sports > Tennis (0.51)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.67)

Add feedback

Interpreting Answers to Yes-No Questions in User-Generated Content

Mathur, Shivam, Park, Keun Hee, Chinnappa, Dhivya, Kotamraju, Saketh, Blanco, Eduardo

arXiv.org Artificial IntelligenceOct-23-2023

Interpreting answers to yes-no questions in social media is difficult. Yes and no keywords are uncommon, and the few answers that include them are rarely to be interpreted what the keywords suggest. In this paper, we present a new corpus of 4,442 yes-no question-answer pairs from Twitter. We discuss linguistic characteristics of answers whose interpretation is yes or no, as well as answers whose interpretation is unknown. We show that large language models are far from solving this problem, even after fine-tuning and blending other corpora for the same problem but outside social media.

computational linguistic, twitter-yn, yes-no question, (14 more...)

arXiv.org Artificial Intelligence

2310.15464

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > Arizona (0.04)
North America > United States > Washington > King County > Seattle (0.04)
(11 more...)

Genre:

Research Report > Experimental Study (0.46)
Research Report > New Finding (0.46)

Industry:

Information Technology > Services (0.93)
Leisure & Entertainment (0.93)
Banking & Finance (0.93)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.71)

Add feedback

Interpreting Indirect Answers to Yes-No Questions in Multiple Languages

Wang, Zijie, Hossain, Md Mosharaf, Mathur, Shivam, Melo, Terry Cruz, Ozler, Kadir Bulut, Park, Keun Hee, Quintero, Jacob, Rezaei, MohammadHossein, Shakya, Shreya Nupur, Uddin, Md Nayem, Blanco, Eduardo

arXiv.org Artificial IntelligenceOct-20-2023

Yes-no questions expect a yes or no for an answer, but people often skip polar keywords. Instead, they answer with long explanations that must be interpreted. In this paper, we focus on this challenging problem and release new benchmarks in eight languages. We present a distant supervision approach to collect training data. We also demonstrate that direct answers (i.e., with polar keywords) are useful to train models to interpret indirect answers (i.e., without polar keywords). Experimental results demonstrate that monolingual fine-tuning is beneficial if training data can be obtained via distant supervision for the language of interest (5 languages). Additionally, we show that cross-lingual fine-tuning is always beneficial (8 languages).

computational linguistic, interpretation, yes-no question, (13 more...)

arXiv.org Artificial Intelligence

2310.1329

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > Washington > King County > Seattle (0.04)
North America > United States > Arizona (0.04)
(21 more...)

Genre: Research Report > New Finding (0.87)

Industry:

Government (0.46)
Leisure & Entertainment (0.46)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.95)

Add feedback

Thread by @AnthropicAI on Thread Reader App – Thread Reader App

#artificialintelligenceDec-19-2022, 20:15:11 GMT

Anthropic Dec 19 • 11 tweets • 5 min read Bookmark Save as PDF My Authors It's hard work to make evaluations for language models (LMs). We've developed an automated way to generate evaluations with LMs, significantly reducing the effort involved. We test LMs using 150 LM-written evaluations, uncovering novel LM behaviors. In the simplest case, we generated thousands of yes-no questions for diverse behaviors just by instructing an LM (and filtering out bad examples with another LM). In the simplest case, we generated thousands of yes-no questions for diverse behaviors just by instructing an LM (and filtering out bad examples with another LM).

automation and human effort, evaluation, lms, (12 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence (0.60)

Add feedback